Learning to Predict Movie Ratings from the Netflix Dataset
نویسندگان
چکیده
In this paper, we describe a hybrid recommendation system combining the two main approaches to recommendation collaborative filtering and content-based classification. We build a collaborative filtering framework to construct a useritem matrix of ratings and produce recommendations based on user-user similarity computed using Pearson correlation. We tackle the sparsity of the user-item matrix by incorporating a NaiveBayes classifier for each user and using it to predict the unknown ratings in the user-item matrix. We presents results from applying our approach to a movie recommendation database provided by the Netflix movie rental service.
منابع مشابه
How To Break Anonymity of the Netflix Prize Dataset
As part of the Netflix Prize contest, Netflix recently released a dataset containing movie ratings of a significant fraction of their subscribers. The dataset is intended to be anonymous, and all customer identifying information has been removed. We demonstrate that an attacker who knows only a little bit about an individual subscriber can easily identify this subscriber’s record if it is prese...
متن کاملThe Netflix Prize
In October, 2006 Netflix released a dataset containing 100 million anonymous movie ratings and challenged the data mining, machine learning and computer science communities to develop systems that could beat the accuracy of its recommendation system, Cinematch. We briefly describe the challenge itself, review related work and efforts, and summarize visible progress to date. Other potential uses...
متن کاملA Million Dollar Reward: Accurate Online Prediction of Movie Ratings
Introduction: We explore the issues that are present in the Netflix Prize dataset. The Netflix Prize seeks to substantially improve the accuracy of user movie rating prediction based on their previous movie preferences and ratings [1]. The contest started in October 2006 and seeks to beat the current Netflix recommendation system by 10% in prediction accuracy. Though some teams have improved pr...
متن کاملExploring collaborative filters: Neighborhood-based approach
In this project, we study the effectiveness of collaborative filtering mechanisms in the context of the Netflix competition. We focus our attention on a dataset provided by Netflix which includes a training set with more than 100 million 4-tuples: user id, movie id, rating, and date [3]. In the first part of this project, we develop a simple model to predict future ratings of users based on the...
متن کاملStatistical Analysis and Application of Ensemble Method on the Netflix Challenge
1. Introduction The Netflix Prize project is proposed by the Neflix Inc., in order to seek accurate predictions on movie ratings. As one group in the Stanford Netflix Prize team, our responsibility is to explore useful statistics and data curation in the training data set, and to explore ensemble methods for improving prediction accuracies. We imported the Netflix data into a MySQL database for...
متن کامل